62 research outputs found

    MetAMOS: A modular and open source metagenomic assembly and analysis pipeline

    Get PDF
    © 2013 Treangen et al. We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS

    Re-Assembly of the Genome of Francisella tularensis Subsp. holarctica OSU18

    Get PDF
    Francisella tularensis is a highly infectious human intracellular pathogen that is the causative agent of tularemia. It occurs in several major subtypes, including the live vaccine strain holarctica (type B). F. tularensis is classified as category A biodefense agent in part because a relatively small number of organisms can cause severe illness. Three complete genomes of subspecies holarctica have been sequenced and deposited in public archives, of which OSU18 was the first and the only strain for which a scientific publication has appeared [1]. We re-assembled the OSU18 strain using both de novo and comparative assembly techniques, and found that the published sequence has two large inversion mis-assemblies. We generated a corrected assembly of the entire genome along with detailed information on the placement of individual reads within the assembly. This assembly will provide a more accurate basis for future comparative studies of this pathogen

    Efficient oligonucleotide probe selection for pan-genomic tiling arrays

    Get PDF
    Background: Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. Results: This paper presents a new probe selection algorithm (PanArray) that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pangenome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. Conclusion: PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on a single microarray chip. These unique pan-genome tiling arrays provide maximum flexibility for the analysis of both known and uncharacterized strains.https://doi.org/10.1186/1471-2105-10-29

    Evolutionary and Experimental Assessment of Novel Markers for Detection of Xanthomonas euvesicatoria in Plant Samples

    Get PDF
    BACKGROUND: Bacterial spot-causing xanthomonads (BSX) are quarantine phytopathogenic bacteria responsible for heavy losses in tomato and pepper production. Despite the research on improved plant spraying methods and resistant cultivars, the use of healthy plant material is still considered as the most effective bacterial spot control measure. Therefore, rapid and efficient detection methods are crucial for an early detection of these phytopathogens. METHODOLOGY: In this work, we selected and validated novel DNA markers for reliable detection of the BSX Xanthomonas euvesicatoria (Xeu). Xeu-specific DNA regions were selected using two online applications, CUPID and Insignia. Furthermore, to facilitate the selection of putative DNA markers, a customized C program was designed to retrieve the regions outputted by both databases. The in silico validation was further extended in order to provide an insight on the origin of these Xeu-specific regions by assessing chromosomal location, GC content, codon usage and synteny analyses. Primer-pairs were designed for amplification of those regions and the PCR validation assays showed that most primers allowed for positive amplification with different Xeu strains. The obtained amplicons were labeled and used as probes in dot blot assays, which allowed testing the probes against a collection of 12 non-BSX Xanthomonas and 23 other phytopathogenic bacteria. These assays confirmed the specificity of the selected DNA markers. Finally, we designed and tested a duplex PCR assay and an inverted dot blot platform for culture-independent detection of Xeu in infected plants. SIGNIFICANCE: This study details a selection strategy able to provide a large number of Xeu-specific DNA markers. As demonstrated, the selected markers can detect Xeu in infected plants both by PCR and by hybridization-based assays coupled with automatic data analysis. Furthermore, this work is a contribution to implement more efficient DNA-based methods of bacterial diagnostics

    Feature-by-Feature – Evaluating De Novo Sequence Assembly

    Get PDF
    The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the information about the correctness of the assembly, comparisons performed on simulated dataset, on the other hand, can be highly biased by the non-realistic assumptions in the underlying read generator. Recently the Feature Response Curve (FRC) method was proposed to assess the overall assembly quality and correctness: FRC transparently captures the trade-offs between contigs' quality against their sizes. Nevertheless, the relationship among the different features and their relative importance remains unknown. In particular, FRC cannot account for the correlation among the different features. We analyzed the correlation among different features in order to better describe their relationships and their importance in gauging assembly quality and correctness. In particular, using multivariate techniques like principal and independent component analysis we were able to estimate the “excess-dimensionality” of the feature space. Moreover, principal component analysis allowed us to show how poorly the acclaimed N50 metric describes the assembly quality. Applying independent component analysis we identified a subset of features that better describe the assemblers performances. We demonstrated that by focusing on a reduced set of highly informative features we can use the FRC curve to better describe and compare the performances of different assemblers. Moreover, as a by-product of our analysis, we discovered how often evaluation based on simulated data, obtained with state of the art simulators, lead to not-so-realistic results

    High-Throughput Phenotypic Characterization of Pseudomonas aeruginosa Membrane Transport Genes

    Get PDF
    The deluge of data generated by genome sequencing has led to an increasing reliance on bioinformatic predictions, since the traditional experimental approach of characterizing gene function one at a time cannot possibly keep pace with the sequence-based discovery of novel genes. We have utilized Biolog phenotype MicroArrays to identify phenotypes of gene knockout mutants in the opportunistic pathogen and versatile soil bacterium Pseudomonas aeruginosa in a relatively high-throughput fashion. Seventy-eight P. aeruginosa mutants defective in predicted sugar and amino acid membrane transporter genes were screened and clear phenotypes were identified for 27 of these. In all cases, these phenotypes were confirmed by independent growth assays on minimal media. Using qRT-PCR, we demonstrate that the expression levels of 11 of these transporter genes were induced from 4- to 90-fold by their substrates identified via phenotype analysis. Overall, the experimental data showed the bioinformatic predictions to be largely correct in 22 out of 27 cases, and led to the identification of novel transporter genes and a potentially new histamine catabolic pathway. Thus, rapid phenotype identification assays are an invaluable tool for confirming and extending bioinformatic predictions

    Recent and historical recombination in the admixed Norwegian Red cattle breed

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of recent patterns of recombination derived from linkage maps to historical patterns of recombination from linkage disequilibrium (LD) could help identify genomic regions affected by strong artificial selection, appearing as reduced recent recombination. Norwegian Red cattle (NRF) make an interesting case study for investigating these patterns as it is an admixed breed with an extensively recorded pedigree. NRF have been under strong artificial selection for traits such as milk and meat production, fertility and health.</p> <p>While measures of LD is also crucial for determining the number of markers required for association mapping studies, estimates of recombination rate can be used to assess quality of genomic assemblies.</p> <p>Results</p> <p>A dataset containing more than 17,000 genome-wide distributed SNPs and 2600 animals was used to assess recombination rates and LD in NRF. Although low LD measured by r<sup>2 </sup>was observed in NRF relative to some of the breeds from which this breed originates, reports from breeds other than those assessed in this study have described more rapid decline in r<sup>2 </sup>at short distances than what was found in NRF. Rate of decline in r<sup>2 </sup>for NRF suggested that to obtain an expected r<sup>2 </sup>between markers and a causal polymorphism of at least 0.5 for genome-wide association studies, approximately one SNP every 15 kb or a total of 200,000 SNPs would be required. For well known quantitative trait loci (QTLs) for milk production traits on <it>Bos Taurus </it>chromosomes 1, 6 and 20, map length based on historic recombination was greater than map length based on recent recombination in NRF.</p> <p>Further, positions for 130 previously unpositioned contigs from assembly of the bovine genome sequence (Btau_4.0) found using comparative sequence analysis were validated by linkage analysis, and 28% of these positions corresponded to extreme values of population recombination rate.</p> <p>Conclusion</p> <p>While LD is reduced in NRF compared to some of the breeds from which this admixed breed originated, it is elevated over short distances compared to some other cattle breeds. Genomic regions in NRF where map length based on historic recombination was greater than map length based on recent recombination coincided with some well known QTL regions for milk production traits.</p> <p>Linkage analysis in combination with comparative sequence analysis and detection of regions with extreme values of population recombination rate proved to be valuable for detecting problematic regions in the Btau_4.0 genome assembly.</p

    The Complete Genome Sequence of Escherichia coli EC958: A High Quality Reference Sequence for the Globally Disseminated Multidrug Resistant E. coli O25b:H4-ST131 Clone

    Get PDF
    Escherichia coli ST131 is now recognised as a leading contributor to urinary tract and bloodstream infections in both community and clinical settings. Here we present the complete, annotated genome of E. coli EC958, which was isolated from the urine of a patient presenting with a urinary tract infection in the Northwest region of England and represents the most well characterised ST131 strain. Sequencing was carried out using the Pacific Biosciences platform, which provided sufficient depth and read-length to produce a complete genome without the need for other technologies. The discovery of spurious contigs within the assembly that correspond to site-specific inversions in the tail fibre regions of prophages demonstrates the potential for this technology to reveal dynamic evolutionary mechanisms. E. coli EC958 belongs to the major subgroup of ST131 strains that produce the CTX-M-15 extended spectrum β-lactamase, are fluoroquinolone resistant and encode the fimH30 type 1 fimbrial adhesin. This subgroup includes the Indian strain NA114 and the North American strain JJ1886. A comparison of the genomes of EC958, JJ1886 and NA114 revealed that differences in the arrangement of genomic islands, prophages and other repetitive elements in the NA114 genome are not biologically relevant and are due to misassembly. The availability of a high quality uropathogenic E. coli ST131 genome provides a reference for understanding this multidrug resistant pathogen and will facilitate novel functional, comparative and clinical studies of the E. coli ST131 clonal lineage

    Simple synthesis of 32P-labelled inositol hexakisphosphates for study of phosphate transformations

    Get PDF
    In many soils inositol hexakisphosphate in its various forms is as abundant as inorganic phosphate. The organismal and geochemical processes that exchange phosphate between inositol hexakisphosphate and other pools of soil phosphate are poorly defined, as are the organisms and enzymes involved. We rationalized that simple enzymic synthesis of inositol hexakisphosphate labeled with 32P would greatly enable study of transformation of soil inositol phosphates when combined with robust HPLC separations of different inositol phosphates
    corecore